AITopics

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.27)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
Europe > Russia (0.14)
(92 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Questionnaire & Opinion Survey (1.00)
(2 more...)

Industry:

Media (1.00)
Leisure & Entertainment (1.00)
Information Technology > Security & Privacy (1.00)
(10 more...)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Information Management (1.00)
Information Technology > Human Computer Interaction (1.00)
(8 more...)

Neural Information Processing SystemsFeb-10-2026, 15:02:02 GMT

CMMA: Benchmarking Multi-Affection Detection in Chinese Multi-Modal Conversations

Human communication has a multi-modal and multi-affect nature.

machine learning, natural language, sentiment, (21 more...)

Country:

Asia > China > Hong Kong (0.05)
Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
Asia > China > Tianjin Province > Tianjin (0.04)
(6 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Communications > Social Media (0.69)

Neural Information Processing SystemsDec-24-2025, 18:32:42 GMT

CMMA: Benchmarking Multi-Affection Detection in Chinese Multi-Modal Conversations

Human communication has a multi-modal and multi-affection nature. The inter-relatedness of different emotions and sentiments poses a challenge to jointly detect multiple human affections with multi-modal clues. Recent advances in this field employed multi-task learning paradigms to render the inter-relatedness across tasks, but the scarcity of publicly available resources sets a limit to the potential of works. To fill this gap, we build the first Chinese Multi-modal Multi-Affection conversation (CMMA) dataset, which contains 3,000 multi-party conversations and 21,795 multi-modal utterances collected from various styles of TV-series. CMMA contains a wide variety of affection labels, including sentiment, emotion, sarcasm and humor, as well as the novel inter-correlations values between certain pairs of tasks. Moreover, it provides the topic and speaker information in conversations, which promotes better modeling of conversational context. On the dataset, we empirically analyze the influence of different data modalities and conversational contexts on different affection analysis tasks, and exhibit the practical benefit of inter-task correlations.

artificial intelligence, machine learning, proceedings, (6 more...)

Country: Asia > Myanmar > Tanintharyi Region > Dawei (0.07)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

arXiv.org Artificial IntelligenceNov-18-2025

PRISM of Opinions: A Persona-Reasoned Multimodal Framework for User-centric Conversational Stance Detection

Wang, Bingbing, Bai, Zhixin, Jin, Zhengda, Wang, Zihan, Song, Xintong, Lin, Jingjie, Li, Sixuan, Li, Jing, Xu, Ruifeng

The rapid proliferation of multimodal social media content has driven research in Multimodal Conversational Stance Detection (MCSD), which aims to interpret users' attitudes toward specific targets within complex discussions. However, existing studies remain limited by: 1) pseudo-multimodality, where visual cues appear only in source posts while comments are treated as text-only, misaligning with real-world multimodal interactions; and 2) user homogeneity, where diverse users are treated uniformly, neglecting personal traits that shape stance expression. T o address these issues, we introduce U-MStance, the first user-centric MCSD dataset, containing over 40k annotated comments across six real-world targets. W e further propose PRISM, a Persona-Reasoned multImodal Stance Model for MCSD. PRISM first derives longitudinal user personas from historical posts and comments to capture individual traits, then aligns textual and visual cues within conversational context via Chain-of-Thought to bridge semantic and pragmatic gaps across modalities. Finally, a mutual task reinforcement mechanism is employed to jointly optimize stance detection and stance-aware response generation for bidirectional knowledge transfer . Experiments on U-MStance demonstrate that PRISM yields significant gains over strong baselines, underscoring the effectiveness of user-centric and context-grounded multimodal reasoning for realistic stance understanding.

large language model, machine learning, natural language, (20 more...)

2511.1213

Country: Asia > China (0.47)

Genre: Research Report (0.82)

Industry: Information Technology (0.68)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Neural Information Processing SystemsOct-10-2025, 15:19:22 GMT

The PRISM Alignment Dataset: What Participatory, Representative and Individualised Human Feedback Reveals About the Subjective and Multicultural Alignment of Large Language Models Hannah Rose Kirk 1

Human feedback is central to the alignment of Large Language Models (LLMs).

huggingface api open ac cess, individualised human feedback reveal, subjective and multicultural alignment, (16 more...)

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.27)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
Europe > Russia (0.14)
(92 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Questionnaire & Opinion Survey (1.00)
(2 more...)

Industry:

Media (1.00)
Leisure & Entertainment (1.00)
Information Technology > Security & Privacy (1.00)
(10 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Neural Information Processing SystemsOct-8-2025, 12:14:29 GMT

CMMA: Benchmarking Multi-Affection Detection in Chinese Multi-Modal Conversations

Human communication has a multi-modal and multi-affect nature.

annotation, dataset, sentiment, (16 more...)

Country:

Asia > China > Hong Kong (0.05)
Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
Asia > China > Tianjin Province > Tianjin (0.04)
(6 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Communications > Social Media (0.69)

Perera, Manoj Madushanka, Mahmood, Adnan, Wijethilake, Kasun Eranda, Islam, Fahmida, Tahermazandarani, Maryam, Sheng, Quan Z.

A Survey of the State-of-the-Art in Conversational Question Answering Systems

arXiv.org Artificial IntelligenceSep-9-2025

Conversational Question Answering (ConvQA) systems have emerged as a pivotal area within Natural Language Processing (NLP) by driving advancements that enable machines to engage in dynamic and context-aware conversations. These capabilities are increasingly being applied across various domains, i.e., customer support, education, legal, and healthcare where maintaining a coherent and relevant conversation is essential. Building on recent advancements, this survey provides a comprehensive analysis of the state-of-the-art in ConvQA. This survey begins by examining the core components of ConvQA systems, i.e., history selection, question understanding, and answer prediction, highlighting their interplay in ensuring coherence and relevance in multi-turn conversations. It further investigates the use of advanced machine learning techniques, including but not limited to, reinforcement learning, contrastive learning, and transfer learning to improve ConvQA accuracy and efficiency. The pivotal role of large language models, i.e., RoBERTa, GPT-4, Gemini 2.0 Flash, Mistral 7B, and LLaMA 3, is also explored, thereby showcasing their impact through data scalability and architectural advancements. Additionally, this survey presents a comprehensive analysis of key ConvQA datasets and concludes by outlining open research directions. Overall, this work offers a comprehensive overview of the ConvQA landscape and provides valuable insights to guide future advancements in the field.

large language model, machine learning, question answering, (19 more...)

2509.05716

Country:

North America > United States (1.00)
Europe (1.00)
Asia > Middle East > UAE (0.28)

Genre:

Overview (1.00)
Research Report > Promising Solution (0.92)
Research Report > New Finding (0.67)

Industry:

Education (1.00)
Information Technology > Services (0.46)
Health & Medicine > Diagnostic Medicine (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceAug-19-2025

In-Context Examples Matter: Improving Emotion Recognition in Conversation with Instruction Tuning

Ma, Hui, Zhang, Bo, Hu, Jinpeng, Shi, Zenglin

Emotion recognition in conversation (ERC) aims to identify the emotion of each utterance in a conversation, playing a vital role in empathetic artificial intelligence. With the growing of large language models (LLMs), instruction tuning has emerged as a critical paradigm for ERC. Existing studies mainly focus on multi-stage instruction tuning, which first endows LLMs with speaker characteristics, and then conducts context-aware instruction tuning to comprehend emotional states. However, these methods inherently constrains the capacity to jointly capture the dynamic interaction between speaker characteristics and conversational context, resulting in weak alignment among speaker identity, contextual cues, and emotion states within a unified framework. In this paper, we propose InitERC, a simple yet effective one-stage in-context instruction tuning framework for ERC. InitERC adapts LLMs to learn speaker-context-emotion alignment from context examples via in-context instruction tuning. Specifically, InitERC comprises four components, i.e., demonstration pool construction, in-context example selection, prompt template design, and in-context instruction tuning. To explore the impact of in-context examples, we conduct a comprehensive study on three key factors: retrieval strategy, example ordering, and the number of examples. Extensive experiments on three widely used datasets demonstrate that our proposed InitERC achieves substantial improvements over the state-of-the-art baselines.

large language model, machine learning, natural language, (17 more...)

2508.11889

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Wang, Hsuan-Yu, Lee, Pei-Ying, Chen, Berlin

Enhancing Speech Emotion Recognition Leveraging Aligning Timestamps of ASR Transcripts and Speaker Diarization

arXiv.org Artificial IntelligenceJul-28-2025

--In this paper, we investigate the impact of incorporating timestamp-based alignment between Automatic Speech Recognition (ASR) transcripts and Speaker Diarization (SD) outputs on Speech Emotion Recognition (SER) accuracy. Misalignment between these two modalities often reduces the reliability of multimodal emotion recognition systems, particularly in conversational contexts. T o address this issue, we introduce an alignment pipeline utilizing pre-trained ASR and speaker diariza-tion models, systematically synchronizing timestamps to generate accurately labeled speaker segments. Our multimodal approach combines textual embeddings extracted via RoBERT a with audio embeddings from Wav2V ec, leveraging cross-attention fusion enhanced by a gating mechanism. Experimental evaluations on the IEMOCAP benchmark dataset demonstrate that precise timestamp alignment improves SER accuracy, outperforming baseline methods that lack synchronization. Speech Emotion Recognition (SER) has gained substantial research attention, particularly for its applications in human-computer interaction.

artificial intelligence, machine learning, natural language, (14 more...)

2507.19356

Country: Asia > Taiwan (0.15)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Emotion (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.88)

Ulmer, Dennis, Lorson, Alexandra, Titov, Ivan, Hardmeier, Christian

Anthropomimetic Uncertainty: What Verbalized Uncertainty in Language Models is Missing

arXiv.org Artificial IntelligenceJul-16-2025

Human users increasingly rely on natural language interactions with large language models (LLMs) in order to receive help on a large variety of tasks and problems. However, the trustworthiness and perceived legitimacy of LLMs is undermined by the fact that their output is frequently stated in very confident terms, even when its accuracy is questionable. Therefore, there is a need to signal the confidence of the language model to a user in order to reap the benefits of human-machine collaboration and mitigate potential harms. Verbalized uncertainty is the expression of confidence with linguistic means, an approach that integrates perfectly into language-based interfaces. Nevertheless, most recent research in natural language processing (NLP) overlooks the nuances surrounding human uncertainty communication and the data biases that influence machine uncertainty communication. We argue for anthropomimetic uncertainty, meaning that intuitive and trustworthy uncertainty communication requires a degree of linguistic authenticity and personalization to the user, which could be achieved by emulating human communication. We present a thorough overview over the research in human uncertainty communication, survey ongoing research, and perform additional analyses to demonstrate so-far overlooked biases in verbalized uncertainty. We conclude by pointing out unique factors in human-machine communication of uncertainty and deconstruct anthropomimetic uncertainty into future research directions for NLP.

large language model, machine learning, natural language, (19 more...)